scikit-learn: Hamming loss

Compute the average Hamming loss.

The Hamming loss is the fraction of labels that are incorrectly predicted.

「ハミング・ロスは間違って予測されたラベルの割合」

小さいほどよいとされる（間違いが1つもなければ0）

In multilabel classification, the Hamming loss is different from the subset zero-one loss.

The zero-one loss considers the entire set of labels for a given sample incorrect if it does not entirely match the true set of labels.

「zero-oneロスは、サンプルに対するラベルの集合全体が真のラベルの集合と完全にマッチしない場合、不正解とみなす」

Hamming loss is more forgiving in that it penalizes only the individual labels.

「ハミング・ロスは、個々のラベルだけにペナルティを与えるので、より寛大」

The hamming_loss computes the average Hamming loss or Hamming distance between two sets of samples.

2つのサンプル（y^・y）のハミング・ロス

y^_j：サンプルのj番目のラベルの予測値

y_j：対応する（j番目のラベルの）真の値

1(x)はxについて1、xでなければ0

xがy^_j != y_jなのでモデルが間違えたラベルに1を加算

y^とyの全てのラベルを間違えるとハミング・ロスは1

y^とyの全てのラベルが一致するとハミング・ロスは0

ラベルが5個あり、2つ一致すると（3つ間違えているので）ハミング・ロスは(3/5=)0.6

これを全サンプルについて平均を取る

code:hamming_loss.py

>> metrics.hamming_loss(2, 2, 3, 4, 1, 2, 3, 4) # index=0だけ間違えた。y_trueは2, y_predは1

0.25

>> metrics.hamming_loss(np.array(0, 1], [1, 1), np.zeros((2, 2))) # multilabelのシミュレーション

0.75 # y_predは0,0],[0,0 ラベル4つのうち3つ間違えている